Xgen Mm Vid Phi3 Mini R V1.5 128tokens 8frames
xGen-MM-Vid (BLIP-3-Video) is an efficient compact vision-language model equipped with an explicit temporal encoder, specifically designed for video content understanding.
Video-to-Text
Safetensors English